A Self-Organizing Neural Model for Motor Equivalent Phoneme Production
نویسنده
چکیده
This paper describes a model of speech production called DIVA that highlights issues of self-organization and motor equivalent production of phonological units. The model uses a circular reaction strategy to learn two mappings between three levels of representation. Data on the plasticity of phonemic perceptual boundaries motivates a learned mapping between phoneme representations and vocal tract variables. A second mapping between vocal tract variables and articulator movements is also learned. To achieve the flexible control made possible by the redundancy of this mapping, desired directions in vocal tract configuration space are mapped into articulator velocity commands. Because each vocal tract direction cell learns to activate several articulator velocities during babbling, the model provides a natural account of the formation of coordinative structures. Model simulations show automatic compensation for unexpected constraints despite no previous experience or learning under these constraints. Overview of the DIVA model Production of an acoustic signal that invariantly conveys to listeners a particular phoneme is carried out with large variability in articulator movements from one instance to the next (e.g., [1],[3],[5]). The process of producing an invariant result in a motor system despite large variations in the contributions of individual components from trial to trial is called motor equivalence. This paper investigates the relationship between phonological units and the motor actions that realize them, focussing on how motor plans to produce phonemes can be learned, and how these plans can lead to motor equivalent phoneme production. Figure 1a is a block diagram of the current model, which is named DIVA because a key component is a transformation from Directions (in vocal tract configuration space) Into Velocities of Articulators. This model assumes the existence of a neural representation of phonological units, or phoneme map, for production of speech. These units are each associated with a plan consisting of a set of synaptic weights that encode target values of vocal tract variables determining key acoustic properties of the vocal tract. These plans are then converted into appropriate articulator movements. The chosen articulator movements depend on context and external conditions, thus resulting in motor variability as seen in human speech. In addition, DIVA assumes the existence of a phoneme recognition system that transforms an appropriate incoming acoustic signal into a representation of the corresponding phonological unit. The model contains two learned mappings, indicated by filled semicircles in the figure: a mapping between phoneme representations and corresponding vocal tract configurations (or motor plans), and a mapping between vocal tract variables and articulator movements that realize desired vocal tract configurations. The levels of representation in DIVA are very similar to those of the speech production model of [6]. However, the nature of the mappings between between these levels differ. Furthermore, mappings in (6] were predefined by the modelers. In DIVA, emphasis is placed on how these mappings can self-organize; that is, how can an infant's speech production system learn the parameters governing this complex dynamical system through self-generated babbling witho1tt an external teacher? Mapping from phoneme representations to vocal tract configurations The variability in articulator positions seen during production of the same phoneme speaks against the explicit control of the spatial positions of speech articulators. Instead, direct, invariant control of higherlevel variables such as bilabial separation or tongue body constriction seems to be used. This result is not surprising; whereas these higher-level variables, or vocal tract variables, directly correspond to the acoustic properties of the vocal tract, the effect of an individual articulator is dependent upon the locations of other articulators. An efficient controller should utilize the flexibility afforded by the redundant set of articulators to invariantly produce acoustic information under a variety of circumstances. 'This work was supported in part by grants NSF IRI-87-16960 and NSF IRI-90-24877. ~~"~om~ lei Phoneme Vocal Tract ~--·· Map Conflg· uratlon
منابع مشابه
A Neural Network Model Of Speech Acquisition And Motor Equivalent Speech Production Running title: Speech acquisition and motor equivalence
This article describes a neural network model that addresses the acquisition of speaking skills by infants and subsequent motor equivalent production of speech sounds. The model learns two mappings during a babbling phase. A phonetic-to-orosensory mapping specifies a vocal tract target for each speech sound; these targets take the form of convex regions in orosensory coordinates defining the sh...
متن کاملCooperative Growing Hierarchical Recurrent Self Organizing Model for Phoneme Recognition
Among the large number of research publications discussing the SOM (Self-Organizing Map) [1, 2, 18, 19] different variants and extensions have been introduced. One of the SOM based models is the Growing Hierarchical Self-Organizing Map (GHSOM) [3-6]. The GHSOM is a neural architecture combining the advantages of two principal extensions of the self-organizing map, dynamic growth and hierarchica...
متن کاملSelf-organizing letter code-book for text-to-phoneme neural network model
This paper describes an improved input coding method for a textto-phoneme (TTP) neural network model for speaker independent speech recognition systems. The code-book is self-organizing and is jointly optimized with the TTP model ensuring that the coding is optimal in terms of overall performance. The codebook is based on a set of single layer neural networks with shared weights. Experiments sh...
متن کاملNew variant of the Self Organizing Map in Pulsed Neural Networks to Improve Phoneme Recognition in Continuous Speech
Speech recognition has gradually improved over the years, phoneme recognition in particular. Phoneme recognition plays very important role in speech processing. Phoneme strings are basic representation for automatic language recognition and it is proved that language recognition results are highly correlated with phoneme recognition results. Nowadays, many recognizers are based on Artificial ne...
متن کاملTowards a neurocomputational model of speech production and perception
The limitation in performance of current speech synthesis and speech recognition systems may result from the fact that these systems are not designed with respect to the human neural processes of speech production and perception. A neurocomputational model of speech production and perception is introduced which is organized with respect to human neural processes of speech production and percept...
متن کامل